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PROVIDING INFORMATION IN 
PF^PDN^F TO SPOKEN REQUESTS 

Background 

This invention relates generally to providing information in response to 
spoken requests. 

Electronic programming guides provide a graphical user interface on a 
television display for obtaining information about television programming. 
Generally, an electronic programming guide provides a grid-like display which 
lists television channels in rows and programming times corresponding to those 
channels in columns. Thus, each program on a given channel at a given time is 
provided with a block in the electronic programming guide. The user may select 
particular programs for viewing by mouse clicking using a remote control on a 
highlighted program in the electronic programming guide. 

While electronic programming guides have a number of advantages, they 
also suffer from a number of disadvantages. For one, as the number of 
television programs increases, the electronic programming guides become 
somewhat unmanageable. There are so many channels and so many programs 
that providing a screen sized display of the programming options becomes 
unworkable. 

In addition, the ability to interact remotely with the television screen 
through a remote control is somewhat limited. Basically, the selection technique 
involves using a remote control to move a highlighted bar to select the desired 
program. This is time consuming when the number of programs is large. 



# 



Thus, there is a continuing need for a better way to provide information in 
response to spoken requests. 

RHcf ppsrriptio n ?f nrawinas 
Figure 1 is a schematic depiction of software modules utilized in 
accordance with one embodiment of the present invention; 

Figure 2 is a schematic representation of the generation of a state vector 
from components of a spoken query and from speech generated by the system 
itself in accordance with one embodiment of the present invention; 

Figure 3 is a flow chart for software for providing speech recognition m 
accordance with one embodiment of the present invention; 

Figure 4 is a schematic depiction of the operation of one embodiment of 
the present invention including the generation of in-context meaning and dialog 

control; . 
Figure 5 is a flow chart for software for implementing dialog control in 

accordance with one embodiment of the present invention; 

Figure 6 is a flow chart for software for implementing structure h.story 
management in accordance with one embodiment of the present invention; 

Figure 7 is flow chart for software for implementing an interface between 
a graphical user interface and a voice user interface in accordance with one 
embodiment of the present invention; 

Figure 8 is a conversation model implemented in software in accordance 
with one embodiment of the present invention; 

Figure 8A is a flow chart for software for creating state vectors in one 
embodiment of the present invention; 



figure 9 is a schematic depiction of hardware for implementing one 
embodiment of the present invention; and 

figure 9A is a front eievational view of one embodiment of the present 

invention. 

p^aiipH Description 
An application may respond to conversational speech, with spoken or 
visual responses including using graphical user interfaces, in accordance w,th 

Mention, a limited domain may be utilized to increase the accuracy o speech 
reckon. A limited or small domain allows focused applications to be 
cemented wherein the recognition of speech is improved because me 

vocabulary is limited. 

While the present invention is illustrated using and an electrons 
programming guide application, it is not limited to any particular application. 
Instead it ma! be applied in a myriad of applications that benefit from speech 

^hety of technics may be utiiized for speech recognition. However, 
in som embodiments of the present invention, the process may be simpl^y 
Ing surface pacing. ,n surface parsing questions or statemen^ ^ 
separately and there is no movement to convert questions ,nto the sam subject, 
Z> obi order as a statement. As a result, conventional, commeraally 
vlhS software may be utilized for some aspects of speech re- 
surface parsing. However, in some embodiments of the present invention, deep 
parsing with movement may be more desirable. 



responsive system — «- -* - ^^Cop - — 
variously phased requests, to use conversat, 0 na M«y 

. trark toDics as topics change and to use recipe y 

may be utilized which may be ariar to conve 



•„n The once mode means that the system only listens for 
recognizer is not runmng. Tine once ^ ^ 

After ci irressfullv recognizing a request, ct 
one query. After s^f 9 ^ for quenes . 

mode The always mode means that tnesy 

, f ho extern starts listening again. 
Ate r answering one o,e*> e**m s ^ ^ ^ ^ 

A nsten state machine utted on ^ ^ 

has said or has re^cte d . * - ^ ^ ^ ^ ^ ^ ^ 

add itseif as a stener o *at ^ ^ ^ ^ ^ 

» ^ US6r - ^ r ^ t r e stening state, *e system is listening to the 
the ^em ,s not « * g ^ ^ ^ ^ ^ ^ 

the present invention. In the illustrated em The application 16 

18. The database 18 may receive inqu.nes from the vo,ce 



«. graphical user interface 14. The graphical and voice user interfaces may be 
synchronized by synchronization events. 

The voice user interface 12 may aiso include a speech synthes,ze 20 a 
speech recognizer 21 and a natural language understanding (NLU) unit 10. n 

a synthesizer. The voice user interface !2 may include a grammar 10a wh,ch 
mav utilized by the recognizer 21. 

A state vector is a representation of me meaning of an utterance by a 
user A state vector may be composed of a set of state variables. Each state 
Lie has a name, a value and *vo flag, An in-context state vector may ta 
I oped by merging an utterance vector which relates to what the user sa, 
and a isto^ vector. A histo^ vector contains informal about what the user 

arising, for example, from the use of pronoun, The amb,gu,ty ,n he utterance 
vector may be resolved by resorting to a review of the Nsuy vector and 
particularly *e information about what the user said in the past 

in any stete vector, including utterance, histo^ or In-context state 
vectors, the state variables may be classlfled as SELECT or WHERE variables 
borrowing the terms SELECT and WHERE from the SQL database languag e>. 
sZ v riables represent information a user is regueshng. In other wore, the 
SELECT variable deflnes what the user wants the system to tell the use, Th,s 
could be a show Mme, length or show description, as examples. 
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WHERE variables represent information that the user has supplied. A 
WHERE variabie may deftne what the user has said. The WHERE variable 
Zides restrictions on the scope of what the user has asKed for. Examples of 
WHERE variab.es include show time, channel, title, rating and genre. 

me query "When is Kfl.es on this afternoon?" may be broken down as 

follows: 

Request: When (from "When is X-Files on this afternoon? 0 
Title: X-Files 

Part of_day_range: afternoon 
The request (when) is the SELECT variable. The WHERE variables .nclude the 
ler attribu es including the titie (X-Eiles) and the time of day (afternoon. 

The informal to formulate responses to user queries may be stored ,n a 
r e,a«ona, database in one embodiment of the present Invention A varie* of 
software languages may be used. By breaking a query down ,nto SELECT 
variables and WHERE variables, the system is amenable to prograrn-g , we., 
known database software such as Structured Que* Language (S Q g. SQL^ 
sandard language for reiational database management system^ SQU * 
SELECT variable selects information from a table. Thus, the SELECT command 
the list of column names from a table in a reiational database The use 
TwHERE command further limits the selected information to particular row 
o he table Thus, a bare SELECT command may provide all the rows ,n a ta le 
tLnation of a SELECT and a WHERE command may provide s = n 
„ the rows of a table, including only those items that are response to o* the 
SELECT and the WHERE command. Thus, by resolving spoken quenes ,nto 



SELECT and WHERE variables, the programming may be facilitated in some 
embodiments of the present invention. 

Referring to Rgure 2 a user request or query 26 may result in a state 
vector 30 with a user flag 34 and a grounding flag 32. The user flag 34 .negates 
whether the state variable originated from the user's utterance. The ground.ng 
flag 32 indicates if the state variable has been grounded. A state variable ,s 
grounded when it has been spoken by the synthesizer to the user to assure 
mutual understanding. The VUI 12 may repeat pordons of the user's quer, back 

to the user in its answer. 

Grounding is important because it gives feedback to the user about 
whether the system's speech recognition was correct. For example, consider the 
following spoken interchange: 

1. User: "Tell me about X-Files on Channel 58". 

2. System: "The X-Files is not on Channel 50". 

3. User: "Channel 58". 

4 System: "On Channel 58, an alien..." 

At utterance number 1, all state variables are flagged as from the user 
and not yet grounded. Notice that the speech recognizer confused flfly and fifty- 
elght At utterance number 2, the system has attempted to repeat the Stle and 
the channel spoken by user and they are marted as grounded. The act of 
speaking parts of the request back to user lets the user know whether the 
speech recognizer has made a mistake. Grounding enables correction of 
recognidon errors without requiring re-speaking the entire utterance. At 
utterance number 3, the user repeats "58" and the channel is again ungrounded. 
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9r ° Und ;Ln 9 ne« to ^ 3, so^e 3 6 , ^ 

^ intprfare (kPl) n one embodiment or uw= \> 
use of an application prog*. nterf AP ^ ^ 

invention. For example, the JAVA . spee* API V ^ ^ 

vector as indicated in block 42. may be 

in one embodiment of the present invent.on, the DAVA sp 

the — — • - « an r " - « 
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., ■ ,». ran rs vector 44 may be merged by 
o!„„ia\/a The history vector 46 and a utterance vecror 

using JAVA, l ne nuau y in -context meaning 

structural histor, management software 62 to create the 

^The context meaning vector 48 is created by 
pro nouhS which are — used in conversation, speech. The^ntext 
meaning vector Is then used as the new histon, vector. Thus, the system 
lies the pronouns by using the speech histo^ vector to gain an 

The in-context meaning vector 48 is tnen pruv.u 
software 1 diaiog control software 52 uses a dialog contro, file to control 
^l oL conversLn and to taKe certain ac H ons in response to the ,n- 

me datTb se » and a language generation module 50. Prior to th = 

methods (See Table I infra). 

Thus referring to Figure 5, the dialog control software 52 initially 

the ability to track topic changes. 
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tetow , Each row in the - : : r ^I L pattern «. one « 
^ in -context meanmg vector 48 ,s comP ^ ^ ^ rQW 

matches the state vector (d.amond 58), then 

Each semanflc action .0 return one of three ^ ^ ^ ^ 
an d RESTART as indicated ^ ^ ^ 57 , and the flow iterates. 

If the RESTART value is returned, the syste ^ ^ ^ 

(block 54). If the STOP value is returned, the system 

qu , es . o^ t ^^z:j^- - 

state vector which has the records returned ^ ^ ^ 



Table! 



nfound 



fiveHelf 
turnOnTV 



turnOffTV 



tuneTV 




function 



defaultTime 



checkDBUmits^ 



jjueryDB 



relaxConstrajnts. 



c^ueryDB 



saySorty 



5iyeAnswer_ 



jjiyeChoice 



Thus in the table above, the state patterns at ,ines 2-5 are basic funcuons 

^tls.x.thestate pattern chec^see if the« m e attribute isdenned. 

„ not , TZ a .notion cailed defau,tT, m e() * — * «- ~ 
1 the appropriate tin* shouid be, set the time attribute, and return a 

""ten, »e pattern is empty so the funcdon —itsO is 
a*d A «me range in the user's request is checked against the «me range 
1 «. nlbase If the user's request extends beyond the end of the 

range. A CONTINUE value is returned. 
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5 stored in history. 
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■ u TfthP user says something that clears 
The system tracks topic changes. If the user say 

rir:::::— . — — — :::: d0 

m L the histo.. ™s per* the « seance of us, 
commands to work properly. 

1. "When is X-Files on", 

2. 'Turn on the TV", 

p*s the X-Hes show into the history. The seconc. 

« turns on «. tension. Since it is an ^^J^T* 

sentence does not erase the history. Thus, the pronoun ,t ,ntherec 
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a ^ * k the Nth title. If the user utterance 
selects a number ^ ^* J an in which the meaning 

SBhS WHERE varies. , so, histo^ is not needed to denve the „ 
context meaning as indicated in block 74. 

Otherwise, a check determines whether the 

indicated in block 84. 

As an exampie, assume that the history vector is as follow. 
Request: When (from "When is X-Bles on this afternoon? 1 
Title: X-Files 

the following attributes: 

Request: Channel (from "What channel is it on? 0 
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• , SELECT attribute but no WHERE attribute ,n the user 
there » a SELECT * ^ „ 

query. As a result, the history vector o neea 



Title: X-Files 

Part_of_day_range: afternoon. 
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Part otjjayjany^- " ,v — 

Request: About (from "What is X-Ries about? ) 

^ men asks "How about Xena?" which has the following 
Assume the user then asKs now 

attributes: 

history vector: „ 
Request: About (from "What is Xena about? 

^TtZZtlZ* , meachcasethe 



meaning. 
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. , , , u/hfre variable, then the in-context meaning 
If an utterance has only a WHERE vanao , 
• * camp as the history vector with the utterance s WHERE variable 
vector is the same as the MUy a ^ ^ 

inserted into the Ifttty vector. If *e utter ^ 

T^SSr W L variabie, then the in-conte* meaning vector * the 
neither a SELECT or WH ^ ^ . n<ontext 

channel, description, rating and genre. a collection of shows is 

More than one show is often under discuss.on. A collection 
More than s ^ (n ^ meanmg 

then that show is the SHOW_SET. |s 
If the user is discussing a particular show in the SHOW_SET, 
' ! SELECTED SHOW attribute. If the attribute Is -1, or missing 

indicated as the SELECTED_bn 

from the meaning vector, then no show in the *W_*r 

SHOW.SET and SELECTED_SHOW are ~ an array 

selected by the graphical user interface 14, ,t fires an even 
; of shows. Optionally, only one of these shows may be selected. Tn 
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of a synchronization event. ^ Wo ^ 

appropriate software language, the statement ^ 

value in the history vector. Iftneym 

consuming database querv again by ^ ^ 

accoun* for two «*«££7 * may be represented by the a 
programming: time and shows. A pom time range variable. 

WA classcalenda, ^^ZZ^L- -Claris 
used to represent time because v 

adding hours, days, etc. her of wnich may 

.0 be null indicating an open bme range, ^ ^ 

presented using attributes such as a »^ Sunday , Monda y. . 

and next; DAY_RANGE which includes now, today, tomor , 
., Saturday, next Sunday. . ., last Sunday. . ., ^ 

25 evening; HOUR which may include the numbers one to tw 
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may include the numbers zero to fifty-nine; and AM_PM which includes AM and 

u ^2;nce. F orex am p,e,,nthe q ues«on,,sStarTreKon next Monday at 

three in the afternoon?" may be resolved as follows: 
Request: When 
Title: Star Trek 
Day_Range: Next Monday 
Part_of_Day_Range: Afternoon 

structure ,s made up of '™ ' ^"when" m of the query. The 

attribute whose value is the x rues . n.u , 

nd a value The data structure may be simplified by ensunng that the 
ZL- ^structures such as inte.e. strings, lists or other database 

anansTelrthe user. me iinguisbc structure of the puery, such as whether ,t 
Til a clause or a quantified set, is deliberately omitted m one 
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accordance with one embodiment of the present invention, recedes tte 
ZTZ indicated in bio* U7. An attdbute o f »e utterance is eterm.ned 
Seated in bio* U8. A non-state vector vaiue is then attached to the 

attributes 104 may include a show set and selected snow. 

attributes iu" y rtf an , France Other components of the 

Led at 105 The conversation modei may also include rules and 

^ 3 Cl^'and ruies U4 in B 9 ure 8 mav inciude a number of methods 
used bv the unit 10. For example, a method SetSeiectedO may be used b the 
^10 t! e the voice user interface 12 what shows have been selected by the 

;tetyntheL 20 is already s« then a Spea.0 revest ,s q ueued to 
the synthesizer 20 and the method returns immedurtely. 

The method Spea^uietQ may be used by the unit 10 to generate 
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synthesizer 20. .f the synthesizer is speaRing, then the text may be saved, 
!„d Len when the synttesizer is done speaKng the current text. 
" t el' J of a processor-^ system tor implement the 

astern memo. 126. The bridge 124 may communicate wft ^ * 

mnlo ho a Peripheral Component Interconnect (PCI) bus in 
couid, for exampie, be a Penphera P ^ ^ fte 

accordance with Rev,s,on 2.1 of the MB* P 

pn Special interest Group, Portland, Oregon 97214. The , 

^pledtoadispiay controller 132 which dnves a display 134 ,n one 

hSZZL. in U 9 -» be implemented as a set-top box 194 as 

r;: j« * ^ ^ ^ - * on and . — , 

television display 134. A A r'cm -nfia where 

A microphone input 136 may lead to the audio codec (AC 9 1 wh. 

it may b e digged and sent to memory through an audio accelerator 136b. The 
salification is available from Intel Corporation 

L processor 120 may be sent to *e audio accelerator 136b and the AC 97 
rndec 136a and on to the speaker 138. 

some embodiments of the present Invention, there may e a problem 

program. In some cases, a mute button may be prov,ded, for example 
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* «i on? in order to mute the audio when voice 
connection with a remote control 202, in order 

requests are being provided. a 
m accordance with another embodiment of the present. 

• , ™ interfere 206 to the processor-based system vu « 
through a wireless interface 206 to t u Alternatively, 
wireless interface 196 in one embodiment of the present 
T'ote control unit 202 may interface wi* *e television receiver 134 

Bus (USB coupling 148 (i.e., a aeviu: k 

Im p,ementers Form Specification, Version 1.0 (www usb.org)). Finally, 

connection 148 may couple to a series of USB hubs 15* 

The EIDE connection 142 may couple to a hard disk drive 146 an 
rom P r v er 144. in some embodiments, other eguipment may be co ^ 
Iding a video cassette recorder (VCR), and a digital versatile disk (DVD) 

5 may ccTple to a serial interface 156 which drives an infrared interface 160 and 
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